% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dataload_from_aws.R
\name{dataload_from_aws}
\alias{dataload_from_aws}
\title{Utility to load datasets from AWS DMAP Data Commons, into memory}
\usage{
dataload_from_aws(
  varnames = .arrow_ds_names[1:3],
  ext = c(".arrow", ".rda")[2],
  fun = c("arrow::read_ipc_file", "load")[2],
  envir = globalenv(),
  mybucket = "dmap-data-commons-oa",
  mybucketfolder = "EJAM",
  folder_local_source = "./data/",
  justchecking = FALSE,
  check_server_even_if_justchecking = TRUE,
  testing = FALSE
)
}
\arguments{
\item{varnames}{character vector of the quoted names of the data objects like blockwts or quaddata}

\item{ext}{like ".arrow" file extension}

\item{fun}{like "arrow::read_ipc_file" or "load" to use when reading}

\item{envir}{e.g., globalenv() or parent.frame()}

\item{mybucket}{where in AWS, like}

\item{mybucketfolder}{where in AWS, like EJAM}

\item{folder_local_source}{path of folder (not ending in forward slash) to
look in for locally saved copies during development
to avoid waiting for download from a server.}

\item{justchecking}{set to TRUE to get object size (and confirm file is accessible/exists)}

\item{check_server_even_if_justchecking}{set this to TRUE to stop checking server to see if files are there
when justchecking = TRUE. But server is always checked if justchecking = FALSE.}

\item{testing}{only for testing}
}
\value{
nothing - just loads data into environment (unless justchecking=T)
}
\description{
Utility to load datasets from AWS DMAP Data Commons, into memory
}
\details{
See source code for details.

***  tries dataload_from_local() first
(at least during development) to avoid slow downloads.

Also see \url{https://shiny.posit.co/r/articles/improve/scoping/}

These files are public-facing -- no credentials required.

Use EJAM:::dataload_from_aws(justchecking=TRUE)

or EJAM:::datapack("EJAM") to get info

or tables()

or object.size(quaddata)

blockid2fips was used only in  state_from_blockid(), which is no longer used by testpoints_n(),
so not loaded unless/until needed.
Avoids loading the huge file "blockid2fips" (100MB) and just uses "bgid2fips" (3MB) as needed, that is only 3\% as large in memory.
blockid2fips was roughly 600 MB in RAM because it stores 8 million block FIPS as text.

Files may include the following:
\itemize{
\item frs               (150 MB .arrow file, approx 700 MB RAM)
\item frs_by_programid  (approx 500 MB RAM)
\item frs_by_sic        (approx  63 MB RAM)
\item frs_by_naics      (approx  60 MB RAM)
\item frs_by_mact
\item quaddata     (168 MB on disk, 229 MB RAM)
\item blockid2fips ( 20 MB on disk, 621 MB RAM!) No longer needed.
\item blockpoints  ( 86 MB on disk, 164 MB RAM)
\item blockwts     ( 31 MB on disk, 196 MB RAM)
\item bgej         (123 MB RAM)
\item bgid2fips    ( 18 MB RAM)
}
}
\seealso{
\code{\link[=datapack]{datapack()}} \code{\link[=dataload_from_pins]{dataload_from_pins()}} \code{\link[=dataload_from_local]{dataload_from_local()}} \code{\link[=dataload_from_package]{dataload_from_package()}} \code{\link[=indexblocks]{indexblocks()}} \code{\link[=.onAttach]{.onAttach()}}
}
